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Abstract — There is a growing interest in computer science, 
engineering, and mathematics for modeling signals in terms 
of union of subspaces and manifolds. Subspace segmentation 

_ -j and clustering of high dimensional data drawn from a union 
of subspaces are especially important with many practical 
applications in computer vision, image and signal processing, 
communications, and information theory. This paper presents 
a clustering algorithm for high dimensional data that comes 
J>^from a union of lower dimensional subspaces of equal and 
known dimensions. Such cases occur in many data clustering 
problems, such as motion segmentation and face recognition. The 

^< algorithm is reliable in the presence of noise, and applied to the 
Hopkins 155 Dataset, it generates the best results to date for 
motion segmentation. The two motion, three motion, and overall 
segmentaion rates for the video sequences are 99.43%, 98.69%, 

i— iand 99.24%, respectively. 

\ | Index Terms — Subspace segmentation, motion segmentation, 
data clustering. 
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I. Introduction 



The problem of subspace clustering is to find a nonlinear 
qq model of the form U = [j ie j Si where {Si} ieI is a set of 
Q> subspaces that is nearest to a set of data W = {wx, wn} € 
,— I R d . The model can then be used to classify the data W into 

classes called clusters. 
<^> In many engineering and mathematics applications, data 
lives in a union of low dimensional subspaces HI, Q, 
13, (4). For instance, consider a moving affine camera 
. . that captures F frames of a scene that contains multiple 
. movm g objects. Let p be a point of one of these objects 
and let Xi(p),yi(p) be the coordinates of p in frame i. 
;_i Define the trajectory vector of p as the vector w(p) = 
°^ (xi(p),yi(p),X2(p),y 2 (p), ■ ■ ■ ,x N (p),y N (p)Y inR 2F . It can 
be shown that the trajectory vectors of all points of an object 
in a video belong to a vector subspace in ]R 2F of dimension 
no larger than 4(5], 0. Thus, trajectory vectors in videos can 
be modeled by a union A4 = UjgjVi of I subspaces where / is 
the number of moving objects (background is itself a motion). 
It can also be shown that human facial motion and other non- 
rigid motions can be approximated by linear subspaces [?], 
[?]. Another clustering problem that can be modeled as union 
of subspaces is recognition of faces. Specifically, the set of 
all two dimensional images of a given face i, obtained under 
different illuminations and facial positions, can be modeled as 
a set of vectors belonging to a low dimensional subspace Si 
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living in a higher dimensional space R d Q, 0, J4). A set of 
such images from different faces is then a union U = {J ieI Si. 
Similar nonlinear models arise in sampling theory where M. d 
is replaced by an infinite dimensional Hilbert space H, e.g., 

L 2 (R d ) 0, ED, ED, ED. 

A. Subspace Segmentation Problem 

The goal of subspace clustering is to identify all of the 
subspaces that a set of data W = {wi, wn} & K d is drawn 
from and assign each data point Wi to the subspace it belongs 
to. The number of subspaces, their dimensions, and a basis for 
each subspace are to be determined. The subspace clustering 
or segmentation problem can be stated as follows: 

Let U = (Ji=i Si where {Si C 'H}f = i 1 is a set 
of subspaces of a Hilbert space H. Let W = 
{wj £ ^IjLi be a set of data points drawn from 
U. Then, 

1) determine the number of subspaces M, 

2) determine the set of dimensions {di} i=1 , 

3) find an orthonormal basis for each subspace 

S h 

4) collect the data points belonging to the 
same subspace into the same cluster. 

Note that often the data may be corrupted by noise, may 
have outliers or the data may not be complete, e.g., there 
may be missing data points. In some subspace clustering 
problems, the number M of subspaces or the dimensions of 
the subspaces {dilflj are known. A number of approaches 
have been devised to solve the problem above or some of its 
special cases. 

1) Sparsity Methods: Elhamifar et al. developed an algo- 
rithm for linear and affine subspace clustering using sparse 
representation of vectors lfl2l . |[T3l . This method combined 
with a spectral clustering, gives good results for motion 
segmentation and it is more general than Eldar's work in com- 
pressed sensing lfl4l . Another method, related to compressed 
sensing by Liu et al. |[T5l . (16 1 finds the lowest rank repre- 
sentation of the data matrix. The lowest rank representation 
is then used to define the similarity of an undirected graph, 
which is then followed by spectral clustering. Favaro et al. in 
[?] extends Q2, H3, 03, (Hi. 

2) Algebraic Methods: Algebraic methods have also been 
used for solving the subspace clustering problem. The Gen- 
eralized Principle Component Analysis (GPCA) is one such 
method Q, ifTTl . [18], and it can distinguish subspaces of 
different dimensions. Since it is algebraic, it is computationally 
inexpensive, however, its complexity increases exponentially 



2 



as the number of subspaces and their dimensions increase. It 
is also sensitive to noise and outliers. The Robust Algebraic 
Segmentation is a more specialized algebraic method devel- 
oped by Rao et al. |[T9l to partition image correspondences 
to the motions in a 3-D dynamic scene (that contains 3-D 
rigid body and 2-D planar structures) under perspective camera 
projection. 

3) Iterative and Statistical Methods: Iterative methods have 
also been employed for the subspace clustering problem. For 
example, the nonlinear least squares II 101 . Q and K-subspaces 
1 20 1 start with an initial estimation of subspaces (or estimation 
of the bases of the subspaces). Then, a cost function reflecting 
the "distance" of a point to each subspace is computed and the 
point is assigned to its closest subspace. After that, each cluster 
of data is used to reestimate each subspace. The procedure is 
repeated until the segmentation of data points does not change. 
These methods, however, are sensitive to the initialization and 
require a good initial partition for convergence to a global 
minimum. 

The statistical methods such as Multi Stage Learning (MSL) 
E), lETTl are typically based on Expectation Maximization 
(EM) |22|. The union of subspaces is modeled by a mixture 
of probability distributions. For example, each subspace is 
modeled by a Gaussian distribution. The model parameters 
are then estimated using Maximum Likelihood Estimation. 
This is done by using a two-step process that optimizes the 
log-likelihood of the model which depends on some hidden 
(latent) variables. In E-Step (Expectation), the expectation of 
the log-likelihood is computed using the current estimate of 
the latent variables. In M-Step (Maximization), the values of 
the latent variables are updated by maximizing the expectation 
of the log-likelihood. As in the case of the iterative methods, 
statistical methods highly depends on initialization of model 
parameters or segmentation and they assume that the number 
of subspaces as well as their dimensions are known. 

The Random Sample Consensus (RANSAC) J23|, which 
has been applied to numerous computer vision problems, is 
successful in dealing with noise and outliers. But it is a 
specialized algorithm and assumes that the subspaces have the 
same dimension and that this dimension is known. 

4) Spectral Clustering Methods: Spectral clustering |24| is 
often used in conjunction with other methods as the final step 
in clustering. Some of the latest subspace clustering algorithms 
(such as [12 1, lfl3l . Il25ll ) aim at defining an appropriate 
similarity matrix between data points which then can be used 
for further processing using the spectral clustering method. An 
application of spectral clustering to motion segmentation can 
be found in |26l . Spectral curvature clustering E71 . 11281 is a 
variant of spectral clustering. [?] provides a spectral clustering 
algorithm that aims at reducing the computational complexity. 
The motion segmentation algorithm developed by Yan and 
Pollefeys [29 1 first estimates a local linear manifold for each 
trajectory data and then computes an affinity matrix based on 
the principle subspace angles between each pair of local linear 
manifolds. The algorithm then uses spectral clustering for 
segmenting the trajectories of independent, articulated, rigid, 
and non-rigid body motions. [30] gives a detailed treatment of 
various related algorithms. 



B. Motion Segmentation Problem 

The appendix gives a detailed treatment of motion segmen- 
tation as a special case of the subspace segmentation problem. 
First, a data matrix W^FxN is constructed using N feature 
points that are tracked across F frames. Then, each column 
of W (i.e., the trajectory vector of a feature point) is treated 
as a data point and it is shown that all of the data points 
that correspond to the same moving object lie in an at most 
4-dimensional subspace of R 2F . 



C. Paper Contributions 

1) This paper presents a clustering algorithm for high 
dimensional data that are drawn from a union of low 
dimensional subspaces of equal and known dimensions. 
The algorithm is applicable to the motion segmentation 
problem and uses some fundamental linear algebra con- 
cepts. Some of our ideas are similar to those of Yan 



and Pollefeys described above in Section I-A.4 How 



ever, our algorithm differs from theirs fundamentally as 
described below: 

• Yan and Pollefeys' method estimate a subspace 5, 
for each point Xj, and then computes the principle 
angles between those subspaces as an affinity mea- 
sure. In our work, we also estimate a subspace for 
each point, however, these local subspaces are used 
differently. They are used to compute the distance 
between each point Xj to the local subspace Si for 
the data point Xi. 

• In their method, an exponential function for affinity 
of two points Xi and Xj is used, and this exponential 
function depends on the principle angles between 
the subspaces Si and Sj that are associated with x. L 
and Xj, respectively. In our case, the affinity mea- 
sure is different. We first find the distance between 
Xj and Si and then apply a threshold, computed 
from the data, to obtain a binary similarity matrix 
for all data points. 

• The method of Yan and Pollefeys uses spectral 
clustering on the normalized graph Laplacian matrix 
of the similarity matrix they propose. However, our 
approach does not use the spectral clustering on 
the normalized graph Laplacian of our similarity 
matrix. Instead, our constructed binary similarity 
matrix converts our original data clustering problem 
to a simpler clustering of data from 1 -dimensional 
subspaces which can be solved by any traditional 
data clustering algorithm. 

Our algorithm is reliable in the presence of noise, and 
applied to the Hopkins 155 Dataset, it generates the 
best results to date for motion segmentation. The two 
motion, three motion, and overall segmentation rates for 
the video sequences are 99.43%, 98.69%, and 99.24%, 
respectively. 

3) Many of the subspace segmentation algorithms use SVD 
to represent the data matrix W as W — f/EV* and 
then replace W with the first r rows of V*, where r is 
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the effective rank of W. This paper provides a formal 
justification for this in Proposition [T] 

D. Paper Organization 

The organization of the paper is as follows: Section |H| gives 



some preliminaries. In Section III we devise an algorithm 
for the subspace segmentation problem in the special case 
where the subspaces have equal and known dimensions. In 



Section IV we apply our algorithm to the motion segmentation 
problem, test it on the Hopkins 155 Datasets, explain the 
experimental procedure, and present the experimental results. 

II. Preliminaries 

In this section, we present Proposition [TJ which will be used 
later to justify that a data matrix W whose columns represent 
data points can be replaced with a lower rank matrix after 
computing its SVD (i.e. W — J7EV*). It can be paraphrased 
by saying that for any matrices A,B,C, a cluster of the 
columns of B is also a cluster of the columns of C = AB. A 
cluster of C however is not necessarily a cluster B, unless A 
has full rank: 

Proposition 1. Let A and B be m x n and n x k matrices. 
Let C = AB. Assume J C {1, 2, • • • , k}. 

1) If hi G span {bj : j G J} then Cj G span {cj : j G J}. 

2) If A is full rank and m > n then h G 
span {bj : j G J} c, G span {cj : j G J} 

Proof. The first part can be proved by the simple matrix 
manipulation 

• h ■■■ b k ] 
Ab, ■■■ Ab k ] 



AB 



= [Ah 
= [Ah 
= [Ah 
= h 



J2jejkj c j '■ 



Ab k ] 
Ab k ] 



(II. 1) 



For the second part, we note that A* A is invertible and 
(A t A)~ 1 A t C = B. We then apply part 1 of the proposition. 
Note that the same result clearly holds if A is invertible. □ 

The proposition above suggest that-for the purpose of 
column clustering-we can replace a matrix C by matrix B 
as long as A has the stated properties. Thus by choosing A 
appropriately the matrix C can be replaced by a more suitable 
matrix B, e.g. B has fewer rows, is better conditioned or is 
in a format where columns can be easily clustered. 

III. Nearness to Local Subspace Approach 

In this section, we develop a specialized algorithm for sub- 
space segmentation and data clustering when the dimensions 
of the subspaces are equal and known. First, a local subspace is 
estimated for each data point. Then, the distances between the 
local subpaces and points are computed and a distance matrix 
is generated. This is followed by construction of a binary 
similarity matrix by applying a data-driven threshold to the 
distance matrix. Finally, the segmentation problem is converted 
to a one-dimensional data clustering problem. The precise 
steps are described in Algorithm [T] and in the explanation that 
follows. 



A. Algorithm for Subspace Segmentation for Subspaces of 
Equal and Known Dimensions 

The algorithm for subspace segmentation is given in Algo- 
rithm [T] We assume that the subspaces have dimension d (for 
motion segmentation, d — 4). The details of the various steps 
are: 

Algorithm 1 Subspace Segmentation 

Require: The m x N data matrix W whose columns are 

drawn from subspaces of dimension d 
Ensure: Clustering of the feature points. 
1: Compute the SVD of W as in Equation ( |jjLj) , 
2: Estimate the rank of W (denoted by r) if it is not 
known. For example, using Equation ( |HI.2| > or any other 
appropriate choice. 

Compute (VrY consisting of the first r rows of V f . 
Normalize the columns of {V r Y . 
Replace the data matrix W with (V r y. 
Find the angle between the column vectors of W and 
represent it as a matrix, {i.e., siccos(W t W) .} 
Sort the angles and find the closest neighbors of column 
vector. 

for all Column vector Xi of W do 

Find the local subspace for the set consisting of Xi and 
k neighbors (see Equation ( POT )), {Theoretically, k is 
at least d— 1. We can use the least square approximation 
for the subspace (see the section Local Subspace Esti- 
mation). Let Ai denote the matrix whose columns form 
an orthonormal bases for the local subspace associated 
with Xi.} 
end for 

for i = 1 to N do 
for j : = 1 to N do 

define H = 

(Wxj-AjxjWp + Wxi-A^XiWp) /2 
end for 

end for{Build the distance matrix} 
Sort the entries of the N x N matrix H from smallest to 
highest values into the vector h and set the threshold rj to 
the value of the T th entry of the sorted and normalized 
vector h, where T is such that ||x[t,at 2 ] — ^Ib is mini- 
mized, and where X[t,n 2 ] is the characteristic function of 
the discrete set [T, N 2 ]. 
17: Construct a similarity matrix S by setting all entries of H 
less than threshold 77 to 1 and by setting all other entries 
to 0. {Build the binary similarity matrix} 
18: Normalize the rows of S using ^i-norm. 
19: Perform SVD S f = C/„E„(K) t - 

20: Cluster the columns of S„(V^)* using k-means. T, n (V n Y 
is the projection on to the span of U n . 

Dimensionality Reduction and Normalization: Let W be an 
m x N data matrix whose columns are drawn from a union 
of subspaces of dimensions at most d, possibly perturbed by 
noise. In order to reduce the dimensionality of the problem, 
we compute the SVD of W 



W = ITEV 
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where U = \u\ u 2 ■ ■ ■ « m ] is an m x m matrix, V = 
\yi V2 ■ ■ ■ vjsr] is an N x N matrix, and £ is an m x N 
diagonal matrix with diagonal entries cj., . . . , 07, where I = 
min{rn, N}. 

To estimate the effective rank of W, one can use the modal 
selection algorithm l29l to estimate the rank r if it is not 
known: 

9 

07+1 

r = argmin r ^ + Kr (M-2) 

l^i i=i a i 

where <jj is the j th singular value and k is a suitable constant. 
Another possible model selection algorithm can be found in 
[31 1. U r 'S r (V r ) t is the best rank-r approximation of W = 
UHV*, where U r refers to a matrix that has the first r columns 
of U as its columns and V r refers to the first r rows of V*. In 
the case of motion segmentation, if there are k independent 
motions across the frames captured by a moving camera, the 
rank of W is between 2(k + 1) and 4(fc + 1). 

We can now replace the data matrix W with the matrix (V r )* 
that consists of the first r rows of V* (thereby reducing the 
dimensionality of data). This step is justified by Proposition [T] 
Also, ifTTll discusses the segmentation preserving projections 
and states that the number of subspaces and their dimensions 
are preserved by random projections, except for a zero measure 
set of projections. It should also be noted that this step 
reduces additive noise as well, especially in the case of light- 
tailed noise, e.g., Gaussian noise. The number of subspaces 
corresponds to the number of moving objects. Vidal et al. [32] 
uses an alternative method (power method) for SVD to project 
incomplete motion data (trajectories) into a 5-dimensional 
subspace and then applies GPCA and spectral clustering for 
subspace segmentation. Dimensionality reduction corresponds 
to Steps 1, 2, and 3 in Algorithm [T] 

Another type of data reduction is normalization. 
Specifically, the columns of (V r )* are normalized to lie 
on the unit sphere S r_1 . This is because by projecting 
the subspace on the unit sphere, we effectively reduce 
the dimensionality of the data by one. Moreover, the 
normalization gives equal contribution of the data matrix 
columns to the description of the subspaces. Note that the 
normalization can be done by using l p norms of the columns 
of (V r y. This normalization procedure corresponds to Steps 
4 and 5 in Algorithm [T] 

Local Subspace Estimation: The data points (i.e., each 
column vector of {Vr)*) that are close to each other are 
likely to belong to the same subspace. For this reason, 
we estimate a local subspace for each data point using its 
closest neighbors. This can be done in different ways. For 
example, if the ^-norm is used for normalization, we can 
find the angles between the points, i.e., we can compute 
the matrix arccos(VV x (V r Y). Then we can sort the angles 
and find the closest neighbors of each point. If we use 
Zp-norm for normalization, we can generate a distance matrix 
{flij) = {\\Xi — Xj\\p) and then sort each column of the 
distance matrix to find the neighbors of each Xi, which is the 
i th column of (K)*- 

Once the distance matrix between the points is generated, 



we can find, for each point x$, a set of k + 1 > d points 
{xi,Xi ± , ...,Xi k } consisting of Xi and its k closest neighbors. 
Then we generate a d-dimensional subspace that is nearest (in 
the least square sense) to the data {x^, Xj, , Xi k }. This is 
accomplished by using SVD 

X = [xi x h ... x lk ] = AT.B 1 . (111.3) 

Let Ai denote the matrix of the first d columns of A 
associated with Xi. Then, the column space C(Ai) is the 
e?-dimensional subspace nearest to {xj, x^ , Xi k }. Local 
subspace estimation corresponds to Steps 6 to 10 in Algorithm 

m 

Construction of Binary Similarity Matrix: So far, we 
have associated a local subspace Si to each point x.- L . Ideally, 
the points and only those points that belong to the same 
subspace as xi should have zero distance from Si. This 
suggests computing the distance of each point Xj to the local 
subspace Si and forming a distance matrix H. 

The distance matrix H is generated as H = (dij) = 
(Wxj-A^Wp + Wxi-A^XiWp) /2. 

A convenient choice of p is 2. Note that as d^ decreases, the 
probability of having Xj on the same subspace as xi increases. 
Moreover, for p = 2, \\xj — Alxj\\2 is the Euclidean distance 
of Xj to the subspace associated with Xi. 

Since we are not in the ideal case, a point Xj that belongs 
to the same subspace as Xi may have non-zero distance to Si. 
However, this distance is likely to be small compared to the 
distance between Xj and if xj and x^ do not belong to 
the same subspace. This suggests that we compute a threshold 
that will distinguish between these two cases and transform 
the distance matrix into a binary matrix in which a zero in the 
(i,j) entry means Xi and Xj are likely to belong to the same 
subspace, whereas entry of one means x,; and Xj are not 
likely to belong to the same subspace. 

To do this, we convert the distance matrix H = (djj)Arxjv 
into a binary similarity matrix S — (sij). This is done by 
applying a data-driven thresholding as follows: 

1) Create a vector h that contains the sorted entries of 
Hnxn from smallest to highest values. Scale h so that 
its smallest value is zero and its largest value is one. 

2) Set the threshold 77 to the value of the T th entry of the 
sorted vector h, where T is such that ||x[t,jv 2 ] ~ ^||2 
is minimized, and where X[t.n 2 ] is me characteristic 
function of the discrete set [T,N 2 ]. If the number of 
points in each subspace are approximately equal, then 
we would expect about — points in each subspace, and 
we would expect ^5- small entries (zero entries ideally). 
However, this may not be the case in general. For this 
reason, we compute the data-driven threshold r\ that 
distinguishes the small entries from the large entries. 

3) Create a similarity matrix S from H such that all entries 
of H less than the threshold 7/ are set to 1 and the others 
are set to 0. 

The construction of binary similary corresponds to Steps 1 1 
to 17 in Algorithm [T] In [29], Yan and Pollofeys uses chordal 
distance (as defined in 11331 ) between the subspaces F{xi) and 
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G(xj) as a measure of the distance between points Xi and Xj 



(III.4) 



where {0i} p i=1 are the principle angles between p-dimensional 
local subspaces T and Q with 9% < ■ ■ ■ < 8 P . In this approach, 
the distance between any pairs of points from T and Q is the 
same. We find distances between points and local subspaces 
and our approach distinguishes different points from the same 
subspace. To see this, let v € span{Qjr}, \ \v\\2 = 1, where the 
columns of Qjr form an orthonormal basis for T . Thus v = 
Q?x for some x with ||a:||2 = 1. Let Qg form an orthonormal 
basis for Q, then the Euclidian distance from v to Q squared 
is given by 



\\v-Pg(v) 



\Qrx-QgQ g QM\l 
\ x \ I2 

\ x \\2 
„twt 



~ x'Q^QgQgQ^x 



x l YY l x 
x t YY t x - 



x YUE Y x 
x t YT, 2 Y t x 



where YY.Z 1 is the SVD for Q^Qg and z := Y l x. Thus, 
using the relation cos6*i = <ii between principle angles and 
singular values l34l . we get 



d 2 (v,G) 



E^ 2 sin 2 ( 



(m.5) 



Hence, our approach discriminates distances from points 
in T to subspace Q. We also have Yli=i z f sin 2 (^) < 
Sf=i sm2 (^i) an d therefore d c is more sensitive to noise. 



Using Eq. III. 5 we get < sin 81 < d < sin 8 p . Assuming 
a uniform distribution of samples from JF and Q, h can be 



approximated by a function depicted in Figure III-A The goal 
is to find the threshold at the jump discontinuity T from 
to sin 8\. Our method minimizes the highlighted area. Under 
this model, a simple computation shows that our data driven 
thresholding algorithm picks T^ = T for sin 8\ / sin 8 p > 1 /2, 
e.g., if 8\ > 30°. In other situations, our algorithm overshoots 
in estimating the threshold index depending on 8\ and 8 p . 




Fig. 1 

Linear modeling for h 



Segmentation: The last step is to use the similarity matrix 
S to segment the data. To do this, we first normalize the 
rows of S using Zi-norm, i.e., S = D^^^S, where D is a 



diagonal matrix (dij) = Ylj=i s *j- Note that S and S are 
not symmetric. S is related to the random walk Laplacian L r 
(S = I—L r ) [?]. Although other l p normalizations are possible 
for p > 1, however, because of the geometry of the l\ ball, 
^-normalization brings outliers closer to the cluster clouds 
(distances of outliers decrease monotonically as p decreases 
to 1). Since SVD (which will be used next) is associated 
with I2 minimization it is sensitive to outliers. Therefore l\ 
normalization works best when SVD is used. 

Observe that the initial data segmentation problem has now 
been converted to segmentation of n 1 -dimensional subspaces 
from the rows of S. This is because, in the ideal case, from the 
construction of S, if Xi and Xj are in the same subspace, the 
i th and j th rows of S are equal. Since there are n subspaces, 
then there will be n 1 -dimensional subspaces. 

Now, the problem is again a subspace segmentation prob- 
lem, but this time the data matrix is S with each row as a data 
point. Also, each subspace is 1 -dimensional and there are n 
subspaces. Therefore, we can apply SVD again to obtain 

Using Proposition [T| it can be shown that S n (U n )* can replace 
S l and we cluster the columns of S n (U„)*, which is the 
projection of S on to the span of U n . Since the problem is only 
segmentation of subspaces of dimension 1, we can use any 
traditional segmentation algorithm such as k-means to cluster 
the data points. The segmentation corresponds to Steps 18 to 
20 in Algorithm [T] 

IV. Experimental Results 

A. The Hopkins 155 Dataset 

The Hopkins 155 Dataset [18] was created as a benchmark 
database to evaluate motion segmentation algorithms. It con- 
tains two (2) and three (3) motion sequences. There are three 
(3) groups of video sequences in the dataset: (1) 38 sequences 
of outdoor traffic scenes captured by a moving camera, (2) 
104 indoor checker board sequences captured by a handheld 
camera, and (3) 13 sequences of articulated motions such as 
head and face motions. Cornerness features that are extracted 
and tracked across the frames are provided along with the 
dataset. The ground truth segmentations are also provided for 
comparison. 

B. Results 

Tables [I] |ll] and III display some of the experimental 
results for the Hopkins 155 Dataset. Our Nearness to Local 
Subspace (NLS) approach have been compared with six (6) 
motion detection algorithms: (1) GPCA Oil, (2) RANSAC 
[23], (3) Local Subspace Affinity (LSA) [29|, (4) MLS 0, 
[21 1, (5) Agglomerative Lossy Compression (ALC) [35], and 
(6) Sparse Subspace Clustering (SSC) fl2l . An evaluation of 
those algorithms is presented in lfl2ll with a minor error in the 
tabulated results for articulated three motion analysis of SSC- 
N. SSC-B and SSC-N correspond to Bernoulli and Normal 
random projections, respectively [12|. The minor error in lfl2ll 
is the listing of error as 1.42% for articulated three motions. 
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It is replaced with 1.60% in Table |TT] In Tables [T||Tll] we used 
the number of neighbors k = 3. Since each point is drawn 
from a 4-dimensional subspace, a minimum of 3 neighbors 
are needed to fit a local subspace for each point. Using the 
same assumption as the algorithms that we compare with, 
we take the rank of the data matrix to be 8 for two motion 
and 12 for three motion. Table [I] displays the misclassification 
rates for the two motions video sequences. NLS outperforms 
all of the algorithms for the checkerboard sequences, which 
are linearly independent motions. The overall misclassification 
rate is 0.57%. This is 24% better than the next best algo- 
rithm. Table [II] shows the misclassification rates for the three 
motion sequences. NLS has 1.31% misclassification rate and 
performs 47% better than the next best algorithm (i.e. SSC-N). 



Table III presents the misclassification rates for all of the video 
sequences. Our algorithm NLS (with 0.76% misclassification 
rate) performs 39% better than the next best algorithm (i.e. 
SSC-N). In general, our algorithms outperforms SSC-N, which 
is given as the best algorithm for the two and three motion 
sequences together. 



Table IV shows the performance of the data driven threshold 



index Tg compared to various other possible thresholds. We 
provide the results for ±20%, ±10%, and ±5% deviations 
from T d . 

Table [V] displays the robustness of the algorithm with 
respect to the number of neighbors k. The second portion of 
the table excludes one pathological sequence from two-motion 
checker sequence for k = 4 and k = 5. When k is set to 3 
- which is the minimum number of neighbors required - the 
algorithm performs better. 



Table VI displays the increase in the performance of the 
original LSA algorithm when our distance/similarity and seg- 
mentation techniques are applied separately. Both of them 
improves the performance of the algorithm, however, the new 
distance and similarity combination contributes more than the 
new segmentation technique. 

Recently, the Low-Rank Representation (LRR) in Q3), [[16]] 
was applied to the Hopkins 155 Datasets and it generated an 
error rate of 3.16%. The authors state that this error rate can 
be reduced to 0.87% by using a variation of LRR with some 
additional adjustment of a certain parameter. 



V. Conclusions 

The NLS approach described in this paper can handle noise 
effectively, but it works only in special cases of subspaces 
segmentation problems (i.e., subspaces of equal and known 
dimensions). Our approach is based on the computation of a 
binary similarity matrix for the data points. A local subspace 
is first estimated for each data point. Then, a distance matrix 
is generated by computing the distances between the local 
subspaces and points. The distance matrix is converted to 
the similarity matrix by applying a data-driven threshold. The 
problem is then transformed to segmentation of subspaces of 
dimension 1 instead of subspaces of dimension d. The algo- 
rithm was applied to the Hopkins 155 Dataset and generated 
the best results to date. 
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Checker (78) 


GPCA 


LSA 


RANSAC 


MSL 


ALC 


SSC-B 


SSC-N 


NLS 


Average 
Median 


6.09% 
1.03% 


2.57% 
0.27% 


6.52% 
1.75% 


4.46% 
0.00% 


1.55% 
0.29% 


0.83% 
0.00% 


1.12% 
0.00% 


0.23% 
0.00% 






Traffic (31) 


GPCA 


LSA 


RANSAC 


MSL 


ALC 


SSC-B 


SSC-N 


NLS 


Average 
Median 


1.41% 
0.00% 


5.43% 
1.48% 


2.55% 
0.21% 


2.23% 
0.00% 


1.59% 
1.17% 


0.23% 
0.00% 


0.02% 
0.00% 


1.40% 
0.00% 



Articulated (11) 


GPCA LSA RANSAC MSL ALC SSC-B SSC-N NLS 


Average 
Median 


2.88% 4.10% 7.25% 7.23% 10.70% 1.63% 0.62% 1.77% 
0.00% 1.22% 2.64% 0.00% 0.95% 0.00% 0.00% 0.88% 


All (120 seq) 


GPCA LSA RANSAC MSL ALC SSC-B SSC-N NLS 


Average 
Median 


4.59% 3.45% 5.56% 4.14% 2.40% 0.75% 0.82% 0.57% 
0.38% 0.59% 1.18% 0.00% 0.43% 0.00% 0.00% 0.00% 



TABLE I 

% SEGMENTATION ERRORS FOR SEQUENCES WITH TWO MOTIONS. 



Checker (26) 


GPCA 


LSA 


RANSAC 


MSL 


ALC 


SSC-B 


SSC-N 


NLS 


Average 


31.95% 


5.80% 


25.78% 


10.38% 


5.20% 


4.49% 


2.97% 


0.87% 


Median 


32.93% 


1.77% 


26.00% 


4.61% 


0.67% 


0.54% 


0.27% 


0.35% 


Traffic (7) 


GPCA 


LSA 


RANSAC 


MSL 


ALC 


SSC-B 


SSC-N 


NLS 


Average 


19.83% 


25.07% 


12.83% 


1.80% 


7.75% 


0.61% 


0.58% 


1.86% 


Median 


19.55% 


23.79% 


11.45% 


0.00% 


0.49% 


0.00% 


0.00% 


1.53% 



Articulated (2) 


GPCA 


LSA 


RANSAC 


MSL 


ALC 


SSC-B 


SSC-N 


NLS 


Average 


16.85% 


7.25% 


21.38% 


2.71% 


21.08% 


1.60% 


1.60% 


5.12% 


Median 


16.85% 


7.25% 


21.38% 


2.71% 


21.08% 


1.60% 


1.60% 


5.12% 



All (35 seq) 


GPCA 


LSA 


RANSAC 


MSL 


ALC 


SSC-B 


SSC-N 


NLS 


Average 


28.66% 


9.73% 


22.94% 


8.23% 


6.69% 


3.55% 


2.45% 


1.31% 


Median 


28.26% 


2.33% 


22.03% 


1.76% 


0.67% 


0.25% 


0.20% 


0.45% 



TABLE II 

% SEGMENTATION ERRORS FOR SEQUENCES WITH THREE MOTIONS. 



All (155 seq) 


GPCA 


LSA 


RANSAC 


MSL 


ALC 


SSC-B 


SSC-N 


NLS 


Average 


10.34% 


4.94% 


9.76% 


5.03% 


3.56% 


1.45% 


1.24% 


0.76% 


Median 


2.54% 


0.90% 


3.21% 


0.00% 


0.50% 


0.00% 


0.00% 


0.20% 



TABLE III 

% SEGMENTATION ERRORS FOR ALL SEQUENCES. 



All-2 (120 seq) 


Data Driven T d 


0.8T d 


0.9T d 


0.95T d 


1.05T d 


1.10T d 


1.20T d 


Average 


0.57% 


0.95% 


1.17% 


0.62% 


0.58% 


1.05% 


0.77% 


Median 


0.00% 


0.00% 


0.35% 


2.27% 


2.27% 


0.00% 


0.00% 


All-3 (35 seq) 


Data Driven T d 


0.8T d 


0.9T d 


0.95T d 


1.05T d 


L10T d 


1.20T d 


Average 


1.31% 


4.39% 


3.18% 


1.42% 


1.20% 


1.24% 


2.06% 


Median 


0.45% 


0.60% 


0.57% 


0.46% 


0.45% 


0.42% 


0.37% 


All (155 seq) 


Data Driven T d 


0.8T d 


0.9T d 


0.95T d 


1.05T d 


1.10T d 


1.20T d 


Average 


0.76% 


1.84% 


1.67% 


0.83% 


0.74% 


1.10% 


1.11% 


Median 


0.20% 


0.00% 


0.00% 


0.20% 


0.20% 


0.18% 


0.19% 



TABLE IV 



% COMPARISON OF THE DATA DRIVEN THRESHOLD INDEX T d WITH OTHER CHOICES. 
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ALL SEQ INCLUDED 




/ SEQ EXCLUDED 


Checker-2 (78) 


k=5 k=4 


k=3 


k=5 k=4 


Average 
Median 


0.65% 1.59% 

A AAW A f\f\07 

U.UUve U.UU%> 


0.23% 
0.00% 


0.23% 0.97% 

A AAC A f\f\Ot 


Traffic-2 (31) 


k=5 k=4 


k=3 


k=5 k=4 


Average 


1.56% 1.66% 

U.UU /© l/.l/U IV 


1.40% 
00% 


1.56% 1.66% 
n nn% n nn% 

U.UU IV u.uu to 


Articulated-2 (11) 


k=5 k=4 


k=3 


k=5 k=4 


Average 
Median 


2.44% 2.33% 

A AA£7 A (\i\D7 


1.77% 
0.88% 


2.44% 2.33% 

A (\C\OI A f\(\Ct 


A//-2 (720 sea) 


k=5 k=4 




k=5 k=4 


Average 
Median 


1.04% 1.75% 
n aa% a o,n% 

U.UU 10 W.WU iv 


0.57% 
00% 


0.77% 1.35% 
n nn% n no% 

U.UU /£> U.UU /© 


Checker-3 (26) 


k=5 k=4 


k=3 


k=5 k=4 


Average 
Median 


0.44% 0.43% 

A 94% 99% 

U.Z/+ /C \I.Z-Z. IV 


0.87% 
35% 


0.44% 0.43% 

A 9/10}, A 99% 


Traffic-3 (7) 


k=5 k=4 


k=3 


k=5 k=4 


Average 

lvlCU.l<lll 


6.59% 7.18% 

| 91% 4 ^7% 


1.86% 
1 53% 


6.59% 7.18% 
181% 4 37% 

l.Ol /© t-.j/ IV 


Articiilated-3 (2) 


k=5 k=4 


k=3 


k=5 k=4 


Average 
Median 


20.54% 4.05% 
20.54% 4.05% 


5.12% 
5.12% 


20.54% 4.05% 
20.54% 4.05% 


(35 se?) 


k=5 k=4 


k=3 


k=5 k=4 


Average 
Median 


2.82% 1.98% 
0.65% 0.47% 


1.31% 
0.45% 


2.82% 1.98% 
0.65% 0.47% 


All (155 seq) 


k=5 k=4 


k=3 


k=5 k=4 


Average 
Median 


1.50% 1.81% 
0.21% 0.00% 


0.76% 
0.20% 


1.30% 1.50% 
0.21% 0.00% 



TABLE V 



% SEGMENTATION ERRORS - NLS ALGORITHM FOR VARIOUS k. 



Checker-2 (78) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


2.57% 0.97% 1.71% 
0.27% 0.00% 0.00% 


Traffic-2 (31) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


5.43% 1.59% 4.99% 
1.48% 1.11% 0.65% 


Articulated-2 (11) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


4.10% 2.10% 4.26% 
1.22% 0.43% 1.21% 


All-2 (120 seq) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


3.45% 1.22% 2.27% 
0.59% 0.00% 0.35% 


Checker-3 (26) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


5.80% 2.66% 4.67% 
1.77% 0.30% 0.91% 


Traffic-3 (7) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


25.07% 6.38% 24.46% 
23.79% 1.28% 31.20% 


Articulated-3 (2) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


7.25% 6.18% 7.25% 
7.25% 6.18% 7.25% 


All-3 (35 seq) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


9.73% 2.45% 8.78% 
2.33% 0.20% 1.94%) 


All (155 seq) 


LSA(Original) LSA(New Dist/Similarity) LSA(New Segmentation) 


Average 
Median 


4.94% 1.84% 3.96% 
0.90% 0.18% 0.61% 



TABLE VI 



% SEGMENTATION ERRORS FOR LSA WITH VARIOUS PARAMETERS. 



